Statistical model selection with “Big Data”
نویسندگان
چکیده
Big Data offer potential benefits for statistical modelling, but confront problems like an excess of false positives, mistaking correlations for causes, ignoring sampling biases, and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models. JEL classifications: C52, C22.
منابع مشابه
Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm
This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the measure...
متن کاملPresenting a model for optimized selection of certified public accountants based on compliance with code of ethics for professional accountants with personality trait approach
Abstract Personality is one of the ways to illustrate human’s characteristics which is usually related to some stable features and other hand Many research evidence regarding big five personal traits have been extended during the years. Current research presents a practical model for optimized selection of certified public accountants based on their personal traits. This study is of causal and ...
متن کاملRobust Signed-Rank Variable Selection in Linear Regression
The growing need for dealing with big data has made it necessary to find computationally efficient methods for identifying important factors to be considered in statistical modeling. In the linear model, the Lasso is an effective way of selecting variables using penalized regression. It has spawned substantial research in the area of variable selection for models that depend on a linear combina...
متن کاملAnalysis of Big Data
1 Theoretical foundation 2 1.1 Statistical models and parameter spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Estimation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Likelihood-based estimation . . . . . . . ....
متن کاملTowards Conceptual Predictive Modeling for Big Data Framework
Predictive modeling is the process of creating a statistical model from data with the purpose of predicting future behavior. In recent years, the amount of available data has increased exponentially and “Big Data Analysis” is expected to be at the core of most future innovations. Due to the rapid development in the field of data analysis, there is still a lack of consensus on how one should app...
متن کامل